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in  the  visual  knowledge  bases  (for  example  how  default  preferences  and  categorical  states  are 
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Top-Down  Influences  on 
Bottom-Up  Processing 


1.0  Introduction 

Although  much  progrem  was  made  in  the  *80’a  in  understanding  how  surface  proper¬ 
ties  could  be  recovered  from  image  data,  minimal  progress  was  made  in  recognising 
objects  and  events  in  natural  scenes.  The  exception,  of  course,  was  when  the  do¬ 
main  was  well  specified  and  the  object  classes  were  known  in  advance.  But  without 
such  knowledge,  little  progress  was  achieved  in  obtaining  n^aningful  descripticms  of 
images  in  terms  of  objects  and  their  behaviors.  Indeed,  even  indexing  to  the  “cor¬ 
rect”  object  category  remained  a  formidable,  largely  unsolved  task.  It  became  clear 
that  a  priori  knowledge,  to  be  applied  “top  down”  to  the  “bottom  up”  stream  of 
information  processors  would  be  necessary.  Over  the  past  three  years,  this  research 
has  been  aimed  at  understanding  the  structure  of  this  “top-down”  knowlege. 

The  work  may  be  conveniently  divided  into  four  different  areas,  plus  a  fifth 
part  that  is  a  collection  of  “q>inoffs”  from  the  primary  effort.  The  first  migor  area 
is  an  understanding  of  what  image  features  support  robust  perceptual  categories  or 
model  classes  that  have  powerful  inductive  leverage.  Critical  to  such  key  features 
is  the  notion  of  properties  that  have  special,  yet  generic  structures  in  the  world. 
The  second  migor  advance  was  a  formal  definition  of  a  percept  -  in  other  words 
a  definition  of  that  state  which  offers  a  meaningful  description  of  an  image  (as 
opposed  to  simply  a  passive  symbolic  description).  Related  to  both  of  these  areas 
is  the  work  of  Feldman  on  Perceptual  Categories,  which  constitute  the  third  major 
area.  Finally,  we  have  developed  a  new  psychophysical  technique  that  allows  us 
to  probe  the  categorical  structure  and  special  parameterizh  .  <n8  of  the  perceptual 
feature  space. 

2.0  What  Makes  a  Good  Feature? 

Let  the  world  consist  of  various  properties  P  that  are  associated  with  various  con¬ 
texts,  C.  Then  p{P\C)  denotes  the  conditional  probability  of  a  property,  P,  such 
as  “has  4  comers”  in  the  context  C,  which  could  be  sitting  “on  a  plane”,  “in  this 
region” ,  etc.  Similarly  the  collection  of  measurements  of  a  property  and  their  con¬ 
ditional  probabilities  will  be  specified  by  F  and  p{F\C).  Note  that  p{P\C)  and 
p{F\C)  are  simply  objective  facts  about  the  world  and  are  not  statements  about 
the  perceiver’s  model  of  the  world.  Our  first  task  is  to  place  conditions  on  F,  P  and 
C  that  ensure  the  measurements  F  constitute  a  reliable  indicator  that  P  occurs  in 
the  world. 
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flgiiM  1  A:  Flat  dittrib«tio>  Suction.  B:  *firiodnI*  dktribntlM. 

2.1  Reliable  Iiifer«nces 

The  jMsterior  probability  of  inferring  property  P  given  the  feature  F  in  context  C 
is  p(i*|F&C).  A  reliable  inference  makes  this  probability  nearly  one,  and  keeps  the 
probability  of  an  ‘error*,  i.e.  p{natP\FitC)  near  sero.  Hence  a  reliable  feature  F, 
in  context  C,  will  keep  the  following  ratio,  namely  Rpott  hiuch  larger  than  one: 

i^st  =  P{P\F^C)  /  p{notP\FLC) .  (1) 

Unng  Bayes  Rule,  Rpo$t  can  be  broken  down  into  the  product  of  two  components, 
a  likelihood  ratio  L  mat  relates  to  the  ‘imaging*  of  P  onto  F,  and  the  prior  prob¬ 
ability  Rprioft  that  relates  to  the  genericity  of  the  world  property  P  in  context  C, 
Spedficafiy,  Rpott  =  ^  *  Pprior* 

Rprior  —  P(^l^)  /  p(notPjC)  and  L  =  p(F|Pl£C)  /  p(FlnofP&C).  (2) 

Note  that  the  likelihood  ratio  captures  the  intuition  that  a  feature  should  arise 
reliably  from  a  given  world  property,  i.e.  L  »  1.  As  will  be  seen  in  the  next  sec¬ 
tion,  however,  this  condition  does  not  insure  a  reliable  inference,  because  if  Rpf{„ 
becomes  too  small,  then  Rpott  can  become  insignificant  even  in  the  presence  d  a 
high  likelihood  ratio.  (Also  see  Jepson  ii  Richards,  1992.) 


.  2.2  .An  Example 

Consider  a  world  of  line  segments  on  a  plane  seen  under  orthographic  view.  Of 
interert  is  the  special  property  ‘two  line  segments  are  parallel”.  Let  the  threshold 
for  discriminating  the  orientation  difference  between  two  (adjacent  lines)  be  9,  and 
let  $  «  $  be  the  limiting  resolution  of  the  process  that  governs  straight  and 
parallel.  Now  let  the  collective  distribution  of  the  orientation  ^  of  all  line  segments 
be  rather  flat  (Figure  lA).  Given  this  context,  we  are  now  presented  with  two  lines 
that  fill  within  the  crosshatched  sample  for  <  9.  Hence  the  two  lines  appear 
parallel;  should  we  conclude  that  these  lines  indeed  arise  from  a  parallel  process? 
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First  note  that  the  likelihood  ratio,  L,  is  very  high,  because  (i)  whenever  parallel 
lines  occur  in  the  world,  they  always  srill  ^pear  parallel  in  the  image,  and  (ii)  our 
chance  of  error  is  vanishingly  small  -  when  two  lines  are  not  parallel,  they  will  not 
be  seen  as  such  except  in  the  rare  case  when  they  lie  within  our  limit  of  resolution 
9.  Hence  p{F\PkC)  =  1  and  p{F\n(APiiC)  is,  say  0.01  if  is  1  part  in  100.  It 
appears  therefore  that  we  riiould  infer  that  the  lines  are  indeed  parallel  in  the  world. 
However,  given  our  chosen  random  world  context,  such  an  inference  is  almost  always 
guaranteed  to  be  wrong. 

Condder  the  prior  probability  ratio  Rpfiof  Because  the  prior  probability 
S  of  the  parallel  process  occurring  is  much  leas  than  the  resolution  limit  9,  the 
area  occupied  by  f  in  Figure  lA  is  much  less  than  the  area  set  by  9.  Thus 

and  its  product  with  the  likelihood  L  Si  \/9  snll  give  an 
a  posteriori  probability  ratio  Rpott  <  Hence  the  odds  really  favor  the  conclusion 
*not  parallel*.  (See  Jepson  ti  Richards,  1992,  and  Knill  U  Kersten,  1991.)  for 
further  details  and  examples.)  In  order  to  raise  Rpo$t  to  a  significant  level,  we  need 
significant  priors,  say  a  f  in  this  case  such  that  f/f  »  1.  In  terms  of  Figure  1, 
this  is  equivalent  to  requiring  that  the  ^  distribution  function  for  pairs  of  lines  be 
biased,  such  as  indicated  in  Figure  IB  where  the  process  *parallel”  appears  as  a 
mode  in  the  probability  distribution  function. 


2.3  Two  Kinds  of  Regularities 

The  important  message  of  the  previous  example  is  that  “good*  features  arise  from 
some  modal  regularity  in  the  distribution  function  of  world  properties.  However, 
not  all  regularities  satisfying  the  likelihood  and  prior  conditions  will  be  useful.  For 
example,  the  property  *two  skewed  lines”  satisfies  these  two  conditions,  but  clearly 
this  property  is  not  very  informative.  Hence  what  we  seek  are  properties  that  are 
not  just  arbitrary  configurations,  but  rather  ones  that  are  in  some  sense  special. 

Tb  illustrate  more  clearly  the  fundamental  difference  between  two  skewed  lines 
and  two  parallel  lines,  we  divide  structural  regularities  into  two  classes:  transverse 
and  non-transverse  (Poston  it  Stewart,  1981).  Ikansverse  relations  arise  when  the 
elements  of  the  model  are  poetioned  arbitrarily  such  as  the  above  two  skewed  lines; 
non-transverse  arrangements  require  careful  positioning,  as  implied  by  the  term 
*non-accidental”  of  Binford  (1981)  and  Lowe  (1985).  Unlike  the  notion  of  ‘non- 
accidental*  ,  however,  the  iisage  of  transverse  and  non-transverse  requires  a  context. 
Thus,  ‘two  parallel  lines”  (or  planes)  in  a  random  stick  (or  planar)  world  would 
be  non-transverse,  but  in  the  context  of  a  building  with  windows  and  doors,  etc., 
the  concept  ‘parallel”  would  become  transverse.  Within  the  proper  context,  non- 
transverse  properties  are  thus  very  special.  But  as  we  showed  earlier,  in  order  to  be 
recoverable  from  image  features,  the  non-transversality  must  be  an  isolated  spike  in 
the  distribution  function  as  in  Figure  IB,  with  sufiBcient  mass  to  be  ‘visible”.  This 
is  what  previous  researchers  meant  by  ‘modal”  properties  (Bobick,  1987;  Marr, 
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1970;  Richards  le  Bobiek,  1088).  Features  that  satisfy  (2)  and  which  arise  from 
mm-transverse  regularities  provide  especially  reliable  and  useful  inferences  about 
world  properties  and  are  called  Key  Features  (see  Jepeon  ti  Richards,  1992)  for 
*natural*  examples  taken  from  motion  and  color).  Loosely  speaking,  F  will  be  a 
Key  Feature  for  property  P  if  P  is  a  generic  non-transverse  mode  in  the  space  of 
world  models,  and  F  occurs  in  the  presence  of  P  but  never  in  its  absence.  Hence 
the  set  of  propertMS  that  image  onto  the  Key  Features  are  an  especially  usdul  set 
ci  properties,  because  they  are  reliably  inferable. 

3.4  An  Example 

lb  illustrate  a  set  of  properties  that  image  onto  key  features  in  our  simplified  world 
of  line  segments  in  a  plane,  assume  there  are  two  processes  that  generate  two  types 
of  relations  between  two  lines.  One  is  the  process  “parallel*;  the  other  is  a  process 
“coincident”,  where  the  lines  just  touch  one  another.  We  take  these  regularities  as 
generic  -  i.e.  we  stipulate  that  both  occur  with  significantly  non-sero  probabilities 
in  the  given  context.  First  we  enumerate  those  regularities  that  image  to  key  fesr 
tures.  Then  in  the  following  section,  we  will  place  an  ordering  on  this  special  set  of 
properties. 

The  enumeration  is  equivalent  to  identifying  all  the  non-transverse  configura¬ 
tions  between  line  segments  in  a  plane,  given  the  chosen  context.  We  assume  the 
measurement  is  the  <ffientation  of  one  line  to  the  other,  and  the  position  x,  y 
of  the  end-point  of  one  line  with  respect  to  the  othn.  Hence  the  relative  position¬ 
ing  has  three  degrees  of  freedom  (DOF).  Referring  to  Figure  2,  the  uninformative, 
transverse  regularity  chooses  x,y,  and  ^  arbitrarily,  producing  two  skewed  lines. 
(Intersection  or  not  was  not  specified  in  our  model  clsss  and  an  “X”  will  be  treated 
as  equivalent  to  skew  without  crossing.)  First,  with  care  the  end  of  one  line  can 
be  placed  on  the  other  (or  its  extension),  eliminating  (me  degree  of  freedom.  These 
configurations  are  assigned  a  codimension  of  one.  Next,  with  still  more  care,  we  can 
place  the  end  of  one  line  exactly  on  the  end  of  the  other,  allowing  only  the  angle  ^ 
to  vary.  This  arrangement  has  a  codimension  of  two. 

Similarly,  if  the  lines  are  parallel,  then  the  orientation  is  fixed  and  the  cost,  or 
codimension  of  the  arrangement  is  again  one.  However  parallel  and  coincident  lines, 
with  one  end  allowed  to  slide  along  the  other,  increase  the  codimension  further  to 
two.  Finally,  we  have  the  last,  most  special  case  of  positioning  of  codimension  3 
where  the  two  lines  merge  into  one  when  placed  end  to  end  in  a  parallel  arrangement. 


S.0  Ikoxn  Features  to  Categories 

Our  main  point  will  be  that  the  “interesting”  structural  regularities  in  a  model  class 
-  namely  those  that  satisfy  the  key  feature  conditions  -  can  be  used  as  a  buis  for 
partitioning  the  model  class  into  categories.  In  the  previous  example,  the  property 
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•pact  would  be  built  from  the  end-point  pontion  meamirementa  x,  p,  and  the  rdative 
poae  Within  thia  x,y,^  apace,  our  propoaal  ia  tliat  the  portioning  ahould  malce 
explicit  the  line-to-line  non-tranaveraalitiea  illuatrated  in  Figure  2.  If  thia  acheme 
ia  adopted,  then  the  aubapacea  wiU  preserve  the  character  of  the  nontranaveraal 
modea,  thua  diatmguiahing  among  the  interesting  pn^erties.  Note  that  the  context 
aensitivity  it  critical  to  our  set-up,  because  it  permits  legitimete  reconfigurations  of 
the  property  apace  depending  upon  the  observer’s  goals,  etc. 


S.l  *Two  Stick*  Categoriesz  The  "Structure  Lattice* 

Continuing  our  example,  we  identify  the  modal  aubqzace  as  that  associated  with  our 
Hwo-stkk*  model  class  presented  earlier  in  Figure  2.  Each  of  these  configurations 
has  a  codimension,  which  allows  us  to  jdace  each  of  these  non-transverse  modes 
in  a  lattice,  where  each  node  depicts  a  proper  subspace  in  the  particular  context 
(Figure  8A).  The  top  node  shows  the  arbitrary  two-stick  configuration.  As  we  move 
down  the  lattice,  the  nodes  below  differ  at  each  successive  level  by  the  removal  of 
exactly  one  degree-of-freedom  from  the  configuration.  Upward  transitbns,  then,  are 
the  elemental  ernes  that  locally  "break*  or  "unfold*  a  non-transversal  property  but 
which  do  not  add  any  additional  non-transvme  properties.  An  important  example 
is  the  missing  link  between  the  "V*  and  "parallel  line*  nodes  (or  siniilarly,  the  "T” 
and  "collinear*  nodes).  There  is  no  direct  route  from  one  node  to  the  other.  The 
explanation  is  that  the  concepts  "edneident*  impose  a  constraint  on  the  endpoint 
podtion  x,y  of  one  line  with  respect  to  the  other,  whereas  the  dmcept  "parallel”  is 
expressed  by  an  angular  relation,  ^  between  the  two  lines.  Because  position  (x,  y)  is 
not  defined  by  angle  (^)  or  vice  versa  in  this  context,  there  is  no  intersection  other 
than  the  excluded  degenerate  case  of  two  coincident  lines.  A  siinilar  explanation 
applies  to  the  missing  path  from  the  two  "collinear*  lines  and  the  "T”  node. 
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Flsvr*  t  A:  Dot-on-linfl*  categoriM.  *Tw>-«tkk*  categoriM  ghraa  eoBcaptt 
coineideatal  aiidf  paraDal.  Sm  text  for  axplaaatloa  of  daahod  patb«  and  aoda*. 
B:  Baatk  texoaomy,  baud  oa  a  Tmioa  of  a  "teKHitick”  moda  lattic*. 


At  the  bottom  of  the  lattice,  two  nodes  have  the  two  sticks  collapsed  to  one. 
These  two  nodes  have  broken  outlines  to  indicate  that  they  are  not  part  of  the 
lattice  for  the  perceptual  context  because  they  suggest  a  *<«e-stick”  configuration. 
(If  the  taro  sticks  were  each  identified  in  some  manner,  say  by  coloring,  then  the 
dashed  paths  and  nodes  would  become  part  of  this  *two-stick”  category  lattice.) 
Thus,  the  result  of  this  construction  is  a  lattice  that  displays  a  partial  ordering  of 
the  categorical  states  available  to  the  perceiver,  given  the  concepts  *paraller  and 
"coincident”.  We  call  such  an  ordering  a  "structure  lattice”. 


S.2  “Natural  Example”:  Beetle  Lattice 

In  the  biological  realm,  growth  processes  exhibit  regularities  (Thompson,  1952).  To 
illustrate  how  such  regularities  can  be  used  for  a  taxonomic  classification,  we  will 
use  a  simple  noodification  of  the  "twostick”  mode  or  structure  lattice  of  Figure  3A. 
Let  the  c<mtext  be  the  backs  of  beetle-like  bu^  that  are  marked  by  two  distinctive 
lines  oriented  with  respect  to  the  symmetry  axis  of  the  beetle.  As  is  typical  for 
budogical  shH>es,  we  assume  the  markings  are  generated  symmetrically  about  this 
axis.  Hence,  with  respect  to  our  "two-stick”  mode  lattice,  one  stick  -  the  "reference 
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stick”  -  will  simply  be  the  symmetries!  bisector  of  the  beetle’s  back.  The  other, 
namely  the  ^second  stick” ,  will  thus  ^pear  twice  in  mirror  image  positions  about 
this  symmetric  axis.  (The  situation  is  equivalent  to  symmetric  markings  appearing 
on  a  left  and  right  wing.)  As  before,  we  assume  two  possible  marking  processes, 
one  laying  the  mark  down  parallel  to  the  bisecting  axis,  the  other  positimiing  the 
end  point  (of  either  the  reference  stick  axis  or  the  additional  marking  stick)  to  be 
coincident  with  one  of  the  two  lines.  All  of  this  sets  the  context. 

Because  the  two^tick  modes  in  a  eimiUr  context  have  already  been  enumerated, 
we  simply  need  to  recast  the  previous  lattice  of  Figure  3A  in  a  symmetric  form 
compatible  with  this  revised  "biological”  context.  This  has  been  done  in  Figure  3B, 
where  now  each  node  depicts  the  markings  on  the  beetle’s  back.  At  the  top,  the 
two  symmetric  mwking  lines  are  set  arbitrarily  with  respect  to  the  bisecting  axis 
(dotted).  This  is  the  codimension  0  case  for  this  species.  At  the  next  level  either 
the  coincident  or  parallel  process  applies,  giving  us  three  codimension  1  subspecies. 
Next,  we  have  two  codimension  2  cases:  in  one  the  marking  lines  form  a  V,  coming 
together  at  the  "head”  of  the  reference  line,  or  the  other  where  the  two  marking 
lines  collapse  onto  the  reference  line  (but  do  not  reach  the  head  of  the  beetle). 
Finally  we  have  a  single  codimension  3  case  in  this  context  where  the  "V”  collapses 
onto  the  reference  bisecting  line.  Given  these  generating  processes  and  this  context, 
these  are  all  the  types  of  beetles  expected.  These  types,  with  the  exception  of  the 
"generic”  beetle  at  the  top,  represent  the  beetle  modes  or  subspecies,  each  exhibiting 
a  slightly  different,  but  related  regularization  of  the  ontology  of  beetles.  Thus  the 
beetle  lattice  is  a  convenient  hypothesis  generator  for  an  observer  who  is  seeking 
to  assign  any  particular  beetle  to  its  "natural”  category  (Feldman,  1992;  Leyton, 
1984).  Elsewhere,  we  consider  how  an  observer  can  induce  new  categories  from  the 
regularized  partitioning  (Feldman,  1992). 


4.0  Percepts  and  Categories 

Our  basic  idea,  then,  is  that  the  structure  and  parameterizations  of  our  models 
describing  the  world  should  match  the  regularities  of  the  image  structure  as  closely 
as  possible,  ff  we  wish  to  extract  world  structure  from  image  structure  in  a  "vivid” 
manner  (Levesque,  1978),  then  the  properties  we  especially  note  in  the  image  should 
directly  point  to  very  specific  world  properties.  This  criteria  cleeurly  places  very 
strong  constraints  upon  the  kinds  of  image  structures  we  should  note,  because  not 
all  properties  of  an  object  can  be  expected  to  appear  reliably  in  an  arbitrary  view 
of  that  object. 

To  clarify  this  point  further,  consider  the  pillboxes  shown  in  Figure  4.  hi  the 
upper  left,  most  immediately  see  the  handle  as  rectangular  and  erect,  with  the  feet 
lying  on  the  top  of  the  pillbox.  However,  an  infinity  of  interpretations  are  possible 
for  this  single  snapshot.  For  example,  the  handle  really  could  be  skewed  and  lying 
flat  on  the  top  -  or  any  other  state  between  flat  and  erect,  including  some  slants 
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Flcnr*  4  Some  pOlboacee  iHth  headlee.  b  tlie  vpper  Idt  depiction,  moet  imme- 
dintelr  eee  tlie  knndle  m  reetaafiilar  end  «reet,  lAemee  in  the  ipper  ri|^t  the 
handle  now  nppeen  flat.  In  the  two  lower  pandf ,  both  the  shape  and  inclination 
of  the  handle  are  lees  clear,  the  percepts  exhibiting  some  mnltistabOities.  Most 
&Tor  an  inclined  rectangnlar  handle  for  the  lower  left;  the  lower  ri|^t  drawing, 
however,  yields  mixed  reports. 


toward  the  viewer.  Nevertheless,  observers  quickly  accept  just  one  interpretation 
from  the  many.  As  will  be  shown  below,  such  a  vivid  and  compelling  perceptual 
conclusion  follows  if  perceptual  categories  are  built  around  modal  key  features. 

Let  us  first  examine  the  orthographic  projections  of  two  rectangular  handles 
onto  the  image  plane  as  illustrated  in  Figure  5.  The  normal  to  a  surface  N  and  the 
visual  ray  to  a  point  on  the  surface  define  a  plane  perpendicular  to  the  surface  at 
that  p<wt.  This  plane  also  defines  a  line  in  the  image.  Then  the  surface  normal  and 
any  other  vector  in  this  plane  must  project  into  this  image  line.  The  bisector  B  of 
a  rectangular  handle  perpendicular  to  the  surface  is  one  such  vector.  We  will  define 
such  a  handle  as  an  erect,  rectangular  handle.  However,  if  the  same  rectangular 
handle  is  not  erect,  i.e.  is  inclined  at  some  angle  to  this  perpendicular  plane,  then 
the  angle  of  its  projection  is  less  constrained.  In  particular,  the  bisector  of  a  fiat 
handle  lying  in  the  plane  of  the  surface  can  project  into  sny  angle  in  the  image 
(see  Figure  6).  fia  a  random  world,  where  both  angles  and  orientations  are  cast  out 
with  equal  probability,  the  image  distribution  has  a  broad  spectrum  (Witkin,  1981). 
Clearly,  if  we  had  to  apply  these  data  to  infer  the  handle  shape  (i.e.  its  “skew*)  and 
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Flgwr*  S  lUgardlaM  of  tho  'riowlnc  ugio  u  met  NeUagnlor  kandle  orill 
ptojact  onto  tke  fanaca  plana  with  ita  Uaactor  B  oriaatad  pmDal  to  tha  projae- 
tioB  of  tha  aorfaea  nomal,  N  (orthograpkie  projaction  ia  aaanmad).  Howarar,  if 
tha  kaadla  ia  Inclinad  to  tha  aarfaca  or  llaa  flat,  than  tha  oriantation  batwaan  tha 
biaactor  and  normal  can  vary  ovar  a  wida  range,  dapanding  on  o  (aaa  Fignra  6). 


its  attachment  angle,  at  best  we  could  only  make  a  maximum  likelihood  judgement 
that  would  typically  be  wrong.^  In  order  for  the  perceiver  to  develop  the  inferential 
leverage  needed  to  strongly  disambiguate  among  many  possible  configurations  of 
equal  likelihood,  the  world  must  behave  somewhat  more  regularly  (Lowe,  1985; 
Witkin  k.  Tenenbaum,  1985).  In  particular,  some  structures  should  tend  to  occur 
significantly  more  often  than  predicted  by  a  uniform  distribution  over  all  possible 
structures. 

Consider  then  ^  world  in  which  the  perceiver  knows  that  handles  will  often 
be  rectangular  and  will  lie  either  flat,  as  if  freely  hinged  and  resting  stably  under 
gravity,  or  erect,  as  if  firmly  attached  perpendicular  to  the  surface.  In  this  world, 
the  distribution  of  handle  orientations  a,  rather  than  being  uniform,  will  have  two 
"spikes*  or  "modes”,  one  at  each  of  the  two  special  world  configurations  as  shown 
in  Figure  6A.  Now,  depending  upon  the  slant  of  the  surface,  the  expected  image 
distribution  of  the  handle  bisector  will  be  as  in  Figure  6B:  the  "erect”  bisector 


*  Surprisingly,  id^en  no  other  information,  the  m^mnm  likelihood  estimate  for  the  3D 
angle  is  jnst  the  image  angle  itself! 
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Crtet  Flat  vartical  liDrizental 


Ftgrore  •  Laft:  Two  atatoa  of  a  netaacnlar  kaadk  wo  takan  at  ngnlaritiaa 
ia  the  world:  aa  araet  state  where  the  plaae  of  the  haadle  Is  peipeadknlar  to 
the  top  enrfaee  of  the  pOlbox  ead  a  flat  state  where  the  plane  of  the  handle 
eoiaeidea  with  this  top  surface.  The  eagle  a  Is  the  eagle  between  the  bisector 
of  the  handle  B  and  the  surface  normal  N.  The  dotted  line  labelled  po  indicates 
the  density  function  for  arbitrary  angles,  other  than  the  erect  (0)  and  flat  (v/fl) 
regularities  which  have  spikes  in  the  probability  density  function.  In  the  image 
(right),  the  erect  handle  also  has  a  spike  in  the  density  function  for  orientation, 
because  parallel  vectors  in  the  world  are  parallel  In  the  image,  hence  the  image 
angle  ^  of  B*  to  N'  is  0.  AH  other  handle  bclinations  project  onto  Image  angles 
that  depend  upon  the  viewpoint,  or  Slant*  of  the  surbee  (o). 


continues  to  stand  out  distinctively.  As  discussed  in  the  previous  section,  such  an 
image  feature  is  designated  a  *key”  feature  because  (i)  its  likelihood  of  correctly 
indicating  the  presence  of  a  particular  world  property  u  high  (i.e.  there  are  few 
false  targets),  and  (ii)  the  associated  world  configuration  has  a  significant  prior 
probability  (see  Knill  it  Kersten,  1991;  Jepson  it  Richards,  1992).  This  latter 
condition,  though  often  overlooked,  is  critical  to  establishing  that  a  given  high- 
likelihood  world  interpretation  is  actually  likely  to  be  correct.  In  other  words, 
the  configuration  ascribed  to  the  world  by  an  inference  must  actuaUy  be  one  that 
commonly  occurs  in  the  context.  Otherwise,  the  probability  of  the  inference  being 
correct  will  actually  be  dominated  by  the  probability  of  a  false  target. 


4.1  Structure  Lattice 

lb  set  up  our  representation,  we  begin  by  introducing  the  vehicle  called  a  ‘struc¬ 
ture  lattice”  that  takes  our  context-sensitive,  primitive  concepts  about  structural 
regularities,  and  composes  them  to  produce  a  set  of  possible  configuration  states. 
(Just  as  we  did  earlier  for  the  *beetles”.)  This  is  the  first  of  several  such  lattices 
we  will  introduce,  the  one  upon  which  the  later  lattices  will  be  built.  Each  of  these 
lattices  will  display  a  partial  ordering  of  the  categorical  states.  (See  Moray,  1990, 
for  a  related  proposal.)  In  the  case  of  the  structure  lattice,  the  ordering  is  derived 
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by  noting  that  some  states  are  special  or  limiting  cases  of  others.  Later  we  will  im¬ 
pose  context-spedfic  preferences  upon  this  collection  in  order  to  seek  a  maximally 
preferred  state. 

Tb  illustrate  again  in  detail  the  role  regularities  in  creating  a  representation  in 
which  the  perceptual  categories  become  obvious,  let  us  propose  that  alignment  and 
perpendicularity  be  special  regularities  between  lines  (or  vectors)  that  we  encounter 
in  our  non-random  world.  For  example,  assume  that  object  parts  have  coordinate 
frames  that  are  often  aligned  in  some  manner  (Arnold  k,  Binford,  1980).  For  the 
pillbox  and  handle,  we  have  two  *parts”  and  hence  two  coordinate  frames.  Let  us 
specify  the  coordinate  frame  for  the  pillbox  by  its  s}rmmetry  axis  A,  and  by  the 
feet  of  the  handle  H.  (See  Figure  7.)  We  will  assume  that  the  pillbox  has  been  cut 
at  right  angles  to  A,  and  hence  the  surface  normal  N  to  the  top  of  the  pill  box  will 
align  with  the  axis  A.  (Note  this  assumed  axiomatic  regularity!)  Together,  A  and 
H  (or  henceforth  N  and  H)  set  up  a  right-angled  Cartesian  coordinate  frame  at 
the  center  of  the  top  of  the  pillbox.  Let  K  be  a  unit  vector  orthogonal  to  N  and 
H,  defined  by  K  =  N  x  H.  To  construct  K',  the  projection  of  K  into  the  image, 
we  use  the  maximum  likelihood  rule  for  slant  derived  by  Kanade  (1983),  which  was 
observed  psychophysically  by  Stevens  (1983)  for  right-angled  coordinate  frames. 
Specifically  K'  will  lie  on  the  bisector  of  the  amgle  between  N  and  H,  illustrated  in 
Figure  7.  (Shortly  we  will  explain  the  role  of  the  numbered  sectors  marked  on  the 
top  of  the  pill  box.) 

Similarly,  a  coordinate  frame  for  the  handle  can  be  defined  by  its  vertical 
symmetric  bisector  B  and  by  a  second  vector  H  which  is  the  direction  of  the  feet  of 
the  handle.  Note  that  we  do  not  assume  that  B  and  H  are  perpendicular.  However 
the  origins  of  the  two  coordinate  frames,  BAH  and  N  A  H,  are  assumed  to  lie 
centered  on  the  plane  of  the  top  of  the  pillbox,  and  coincident  with  the  m^or  axis 
of  the  pillbox.  We  thus  are  assuming  the  following: 


Contextual  Regularitiesi 
Parts: 

Support: 

Surface  Normal  Alignment: 
Gravity  Alignment: 
Cartesian  FVame: 

Viewpoint: 


Pillbox  is  convex  (e.g.  solid  top). 

Handle  is  planar. 

Both  feet  of  handle  lie  in  plane  of  top  surface  of 
of  pillbox  (B  lies  on  or  above  this  plane). 

N  =  A 
A  =  G 
N-H  =  0 
K  HrriO 

Pillbox  is  seen  from  above. 
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FIsom  T  UbH  vaeton  that  defina  the  haadk  and  pIULds  coordinate  framee. 
The  diagonal  line  that  bieeete  N,  H  eets  up  the  third  right  an^ed  ask  K.  The 
dotted  elUpaes  are  clrclea  that  create  the  six  eectore  diecneaed  in  the  text.  The 
Inaeta  depict  handles  with  bisectors  projecting  Into  the  various  sectors. 


The  additional  vector  G  is  taken  to  be  the  gravity  axis,  which  la  aligned  with  the 
customary  page  orientation,  typical  for  the  depiction  of  a  stably  supported  object.^ 
In  sum,  tile  above  equalities  set  up  two  coordinate  frames,  one  rectangular  for  the 
pillboK  defined  by  N  and  H  and  the  other  not  necessarily  rectangular  for  the  handle 
defined  by  B  and  H. 

Given  the  vectors  N,  B,  B[  and  K  we  can  now  explore  aU  possible  alignments  of 
these  vectors.  Recall  we  are  proposing  that  the  perceiver  is  aware  of  certain  *mode8” 
or  configurations  of  structures  that  occur  often  in  the  world.  In  purticular  the  special 
regularities  we  chose  were  the  coUinearity  of  two  lines  or  vectors,  such  as  B  =  N,  and 
the  perpendicularity  of  two  lines,  such  as  B  X  H,  which  corresponds  to  a  rectangular 
handle,  or  B  X  N  which  defines  a  flat  handle.  Hence  to  generate  all  these  special 
configurations  that  are  the  consequence  of  these  particular  relational  concepts,  we 
simply  enumerate  all  the  alignments  of  B  with  N,  K  and  H,  using  either  the 
collinear  (=)  or  perpendicular  (X)  relation.  The  result  of  this  entimeration  will 
then  be  those  special  categories  that  make  sense  to  us,  given  our  chosen  relational 
concepts.  We  begin  first  with  the  three  collinear  alignments: 


^Elsewhere  we  have  explored  this  preference  for  supported  objects  (Jepeon  Se  Richards, 
1998). 
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ColMnear  Relation 

Category 

Wototigp 

B  =  N 

erect  rectangular  handle 

ER 

B  =  K 

flat  rectangular  handle 

FR 

B  =  H 

degenerate  (infinitely  skewed  handle) 

If  the  bisector  B  does  not  align  with  ather  N,  K  w  H,  then  we  define  the  handle 
as  being  either  "tilted”,  which  is  noted  as  *7”*,  or  "skewed”,  which  is  noted  as  "5”, 
or  both,  namely  "T5”.  In  particular,  if  the  bisector  is  in  the  plane  determined  by 
N  and  K,  then  the  handle  is  tilted  and  rectangular,  i.e.  "TiZ”.  Similarly,  if  the 
bisector  is  in  the  plane  containing  N  and  H,  it  will  be  erect  and  skewed,  i^.  "ES” , 
while  for  the  flat  and  skewed  state  the  bisector  will  be  in  the  H>K  plane.  Thus, 
excluding  the  above  coUinear  specialisations,  we  now  have  the  following  additional 
three  new  cases  (alternatively  we  could  have  filled  out  a  4  x  4  table): 


Perpendicular  Relation  Description 


B  ±  H  tilted  rectangular  handle  TR 

B  ±  K  erect  "skewed”  handle  ES 

B  ±  N  flat  "skewed”  handle  FS 


Finally,  we  have  the  category  where  none  of  the  relations  hold: 

Arbitrary  Relation  Category  Notation 

(none  of  the  above)  tilted,  skewed  handle  TS 

Excluding  the  degenerate  case  B  =  H,  we  thus  have  six  types  of  categwies  for 
the  positioning  of  the  handle,  given  our  conceptualisation  that  part-based  structures 
in  the  world  typically  are  related  by  an  alignment  of  some  aspect  of  their  individual 
coordinate  frames.  Because  we  can  count  the  number  of  axes  each  frame  that 
are  aligned  (i.e.  either  one  axis  or  two),  a  partial  ordering  can  be  placed  on  these 
six  types  of  structures.  This  is  illustrated  in  Figure  8  as  a  graph  or  lattice.  At 
the  top  of  this  lattice,  the  positioning,  T,  and  shape,  5,  of  the  handle  is  arbitrary. 
At  the  bottom,  however,  we  have  two  states  where  the  position  and  shape  of  the 
handle  are  both  fixed  to  be  rectangular  and  either  flat  or  erect  (i.e.  FR  and  ER). 
In  other  words,  all  degrees  of  freedom  of  alignments  have  been  removed.  In  between 
are  the  planar  alignment  states  where  one  degree  of  freedom  of  movement  is  still 
allowed.  For  example,  the  leftmost  node  ES  permits  the  skew  of  the  handle  to  vary, 
but  it  must  remain  erect.  Hence,  as  we  move  from  top  to  bottom  in  this  lattice, 
more  and  more  specialization  or  restrictions  are  placed  on  the  configuration.  As 
previously  mentioned,  we  call  this  lattice  a  "structure  lattice”  because,  given  the 
assumed  alignment  regularity  this  lattice  shows  the  specialization  relations  between 
the  categories  of  structures  that  will  appear  in  our  representation. 


14 


RICHASD6 


FINAL  BEPOBT  190O-»8 


Figure  8  The  itnictiin  Uttke  for  the  pfflbax  phu  hudle  (Le.  the  ^ete 
epeee*). 

4.2  Preference  Relations 

The  structure  lattice  simply  enumerates  the  structural  categories  that  we  know 
about,  or  can  easily  infer,  given  our  choeen  regularities.  Ideally,  we  would  hope 
that  the  image  is  consistent  with  some  kind  of  maximisation  of  these  regularities. 
In  other  words,  given  a  particular  context,  we  expect  certain  regularities  to  ^>pear, 
but  in  another  context  the  structures  expected  might  differ.  For  example,  a  “flat* 
handle  would  not  be  likely  if  the  pillbox  were  upside  down.  This  suggests  that  given 
a  context,  there  is  a  preference  ordering  on  the  expected  regularities.  If  you  will,  a 
ranking  is  given  to  the  prior  probabilities  of  the  structures  that  are  expected  in  the 
assumed  context. 

A  preference  ordering  differs  from  the  structure  ordering  introduced  in  the 
-  previous  section.  The  structure  lattice  simply  presents  all  the  categories  available 
to  us  in  the  chosen  context,  ordered  with  respect  to  increasing  specialisation 
structure.  A  preference  ordering  specifies  which  kinds  of  specialisations  are  pre¬ 
ferred  to  others.  So,  for  example,  given  a  choice  between  handle  shapes  that  are 
rectangular  or  skewed,  well  prefer  the  rectangular  version.  This  preference  is  not 
surprising,  because,  after  all,  we  elected  to  parameterize  the  coordinate  frame  for 
handfr  sh^e  in  terms  of  rectilinear  coordinates.  Denote  this  preference  for  rect¬ 
angular  over  skewed  shapes  as  >  5.  Similarly,  for  the  attachment  angles,  our 
parameterization  suggests  that  the  erect  *E”  and  flat  *F”  angles  will  be  preferred 
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FIson  •  BlaoMBtol  pnCmaM  Mhttoas  fw  kudk  ik»p*  (kft)  uid  kaadW 
lacEiutioa 


over  arbitrary  inclinatioiis,  or  *1**,  hence  E  >  T  and  F  >  T.  However, 

we  have  no  reason  to  believe  that  an  mrbitrarUf  sh^>ed  erect  handle  *E*  will  be 
preferred  over  one  that  is  flat,  *F”.  Denote  this  indifierence  by  ~  .F.  We  will 
designate  these  three  orderings,  one  for  shape  and  the  other  two  for  attachment 
angle  as  the  "efemental*  preference  orderinip.  They  can  also  be  cast  in  the  fi>rm  of 
a  directed  graph  as  in  Figure  9. 

Using  the  above  elemental  preference  relatimts,  we  can  now  impose  a  partial 
order  on  the  states  of  the  base  plus  handle  cmifigurations  that  we  know  about, 
namely  the  states  shown  in  Figure  8.  Th»  preference  ordering  is  based  on  the 
consensus  of  the  elemental  preference  relations,  and  is  iUustrated  in  Figure  10.  Note 
that  a  state  such  as  ER  is  to  be  preferred  over  TS  because  both  of  the  elemental 
preference  relations,  E  >  T  and  R  >  S,  &vor  the  same  state.  However,  such  a 
consensus  does  not  always  occur.  For  example,  the  same  two  elemental  relations 
are  in  conflict  for  the  states  ES  and  TR,  and  as  a  result  these  two  states  rem^ 
unordered  in  the  preference  ordering.  The  intuition  behind  such  unordered  states  is 
that  the  perceiver  does  not  have  sufficient  information  to  be  able  to  resolve  whether 
ES  should  be  preferred  to  TR,  or  vice  versa.  Thus  unordered  states  represent  a 
total  lade  of  information  on  the  appropriate  preference.  In  additiem,  we  also  have  a 
distinct  notion  of  an  equal  preference  between  two  states,  such  as  occurs  between 
ER  and  FR,  as  well  as  between  ES  and  FS. 

In  general  we  cannot  expect  a  consensus  ordering  to  provide  a  total  ordering  of 
the  state  space,  because  some  conflicts  amoungst  the  elemental  preference  relations 
are  likely  to  hold.  This  is  related  to  Arrow’s  general  impossibility  theorem  which 
states  that  rational  choice  -  i.e.  rational  voting  behaviour  -  does  not  guarantee  a 
unique  winner  (Dcyle  te  Welman,  1989;  Saari,  1992).  Somewhat  counter-intuitively, 
the  introductio  n  of  more  elemental  preference  relations  does  not  lead  to  a  more 
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complete  orderinf ,  but  rather  tends  to  introduce  more  conflicts  and  hence  tends 
to  eliminate  ordering  relations.  To  countei^act  this  tendency  to  fracture  the  state- 
sf>ace,  it  is  <rften  useful  to  ccmsider  priorities  amoungst  the  elemental  Reference 
relations  (Jepson  fr  Richards,  1993).  Such  priorities  can  break  particular  amflicts 
and  thereby  enlarge  the  ordering.  Nevertheless,  we  should  expect  typical  preference 
orderings  to  be  partial  as  a  consequence  of  the  incomplete  knowledge  a  perceiver 
hvs  of  its  current  domain.  Of  particular  interest  are  instances  in  which  the  ordering 
ibsults  in  several  maximaUf  preferred  explanations  of  the  image  structure,  where 
it  remains  undecided  just  which  maximal  state  is  to  be  preferred.  As  we  discuss 
elsewhere,  tL*^  is  an  intuitive  explanation  behind  the  difference  in  the  stability  of 
the  percepts  in  the  upper  and  lower  panels  of  Figure  4. 

4.3  The  PillboK  phis  Handle 

Tb  clarify  our  framework  further,  we  now  return  to  Figure  4,  and  use  these  images 
togetiier  with  the  preference  relatbns  to  impose  an  ordering  cm  the  state  space  in 
each  case.  For  all  d  }pictions,  we  assume  that  the  view  is  from  above.  We  also 
initially  assume  chat  a  Cartenan  coordinate  frame  for  the  {nllbox  is  set  up  unng 
the  Kanade-Stevens  rule,  namely  that  the  axis  IK  is  seen  as  lying  abng  the  bisector 
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ot  the  image  angle  between  N  and  H  (i.e.  ae  ahown  in  Figure  7).  We  take  thie 
coordinate  frame  aa  being  the  eatfae  coordinate  eyetem  containing  the  line  H  and 
the  line  perpendicular  to  N.  Later  we  will  consider  cases  when  this  frame  itself 
appears  as  a  preference  that  may  be  altered. 

We  begin  by  choosing  a  representation  that  allows  us  to  effortleasly  read  off 
the  states  of  the  handle  of  interest,  given  a  particular  image.  Figure  7  shows  the 
form  of  this  vivid  representation,  baaed  on  the  N,  K  and  H  comdinates.  The  added 
feature  is  that  now  we  identify  the  six  sectors  of  unit  drde  (seen  slanted)  that  are 
seen  to  lie  between  the  projections  of  these  axes  of  the  coordinate  frame.  The  idea 
then  is  to  regard  the  bisector  B  as  the  arm  of  a  clock,  and  simply  ncrte  either  the 
sector  it  falls  in,  or  whether  it  is  precisely  aligned  with  one  of  the  axes.  A  umple 
example  is  when  B  is  aligned  with  either  N  (w  K.  If  B  =  N,  then  obviously  the 
erect  rectangluar  handle  ER  is  a  possibility,  because  the  handle  is  BR  if  and  only 
if  B  =  N,  whereas  if  the  handle  is  flat  and  rectangular,  then  B  =  K.  Similarly, 
if  B  faDa  into  one  of  the  six  sectors,  again  we  can  easily  check  to  see  if  a  state  is 
amnstent  or  not.  For  example,  when  the  handle  is  rectangular  and  tilted  forward, 
B  must  be  in  the  upper  quadrant  of  the  NK  plane  and  hence  its  projection  must 
fan  into  sectors  2  or  3  (see  insets  to  Figure  7).  Similarly  if  the  handle  is  erect  but 
skewed,  B  must  lie  in  sectors  1  and  6  (if  skewed  to  the  left)  or  sector  2  (if  skewed 
to  the  right).  The  following  table  captures  afl  the  cases  (excluding  the  alignments): 


Sector  Possible  Categories 


1 

2 

3 

4 

5 

6 


TR  (backward),  ES,  FS,  TS 
TR  (forward),  ES,  FS,  TS 
TR  (forward),  FS,  TS 
FS,  TS 
FS,TS 
ES,  FS,  TS 


Thble  1  The  powlble  attachmeat  categories  for  handle  pose,  ^en  the  sector 
that  the  bisector  B  falls  into.  (See  Figure  7.) 

Note  that  our  condition  that  the  handle  lies  on  or  above  the  top  of  the  pillbox 
constrains  TR  and  ES  to  require  that  B  not  fall  in  sectors  4  and  5. 

The  state  space  for  the  two  upper  drawings  in  the  top  panel  of  Figure  1  is 
given  in  Ihble  2.  Again,  we  use  the  notation  B,  F,  R  respectively  to  indicate  an 
erect,  flat  or  rectangular  handle,  or  S  and  T  repectively  to  indicate  either  a  skewed 
or  "tilted*  handle.  First  consider  the  possibilities  fn  the  “erect*  handle  depiction 
in  the  upper  left  drawing.  The  bisector  B  aligns  with  N.  Hence  BR  is  an  obvious 
choice.  However,  B  can  also  lie  off  N,  but  in,  the  plane  defined  by  the  visual  ray. 
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Uppw  Loft  Drawiag 

TS 

* 

FS 

* 

« 

ER 


Handle  State 

arbitrary  tilt  4c  skew 
erect,  akewed  handle 
flat,  akewed  handle 
tfltad,  rectangular 
flat,  rectangular 
erect,  rectangular 


Upper  Right  Drawing 
TS 


(?S) 

(TR) 

FR 


TkUe  a  State  apaeee  for  the  two  draedage  in  the  upper  panel  of  Figure  4. 


All  of  theae  atatea  correapcmd  to  either  tflted  and  akewed  {TS)  handlea,  or  perhape 
a  flat  and  akewed  {FS)  handle.  Note  that  erect  and  akewed  {BS)  is  not  consistent 
with  the  Kanade>Stevens  coordinate  frame  since,  from  Figure  7,  we  see  the  only 
way  the  handle  can  be  in  the  NH  plane  (Le.  erect)  yet  have  B  align  with  N  in 
the  image  is  for  B  to  equal  N  (i.e.  the  ER  state).  Similar  arguments  showing  that 
TR  and  FR  states  are  inconsistent  can  also  be  read  off  of  Figure  7.  Theae  three 
inconsistent  states  are  indicated  by  an  asterisk  in  Ihble  2.  The  remaining  three 
valid  states,  TS,  FS,  and  ER  can  now  be  ordered  using  consensus  amoungat  the 
elemental  preference  relations  introduced  above.  The  result  is  shown  in  Figure  11 
Mt,  and  is  seen  to  be  a  total  ordering  with  the  erect  rectangular  handle  {ER)  as 
the  unique  maximal  state. 

Similarly,  for  the  upper  right  drawing  we  first  note  that  the  leg  of  the  handle, 
hence  the  bisector  B  appears  to  align  with  the  axis  K  in  the  Kanade-Stevens  coordi¬ 
nate  frame  for  the  pillbox.  Therefore,  FR  is  obviously  in  the  state  space.  However, 
the  true  SD  orientation  of  B  need  not  be  coincident  with  K,  but  can  point  any¬ 
where  in  the  plane  created  by  the  lines  of  sight  through  K,  and  hence  TS  is  also  a 
possibility.  Obviously  the  erect  states  ES  and  ER  are  excluded  because  B  lies  in 
sectors  S  and  4  below  H.  States  TR  and  FS  are  marginal,  depending  cm  whether  B 
is  taken  to  be  precisely  aligned  with  K  or  not.  ff  B  is  seen  to  fall  below  K  in  the 
representation  depicted  in  Figure  7  (i.e.  in  sector  4),  then  TR  is  not  in  the  state 
space.  But  if  B  lies  above  K  (in  sector  3),  then  TR  is  a  possibility.  In  either  case, 
FS  is  poarible.  Because  of  this  ambiguity  TR  and  FS  are  shown  parenthetically  in 
Thble  1,  and  as  dotted  nodes  in  the  preference  ordering  of  Figure  8  (right).  Again, 
the  ordwing  here  follows  from  the  relati<»s  F  >  T  and  R  >  5,  yielding  the  state 
FR  seen  most  *vividly*  as  the  maximal  node. 

For  the  two  more  ambiguous  drawings  in  the  lower  panel  of  Figure  4  we  may 
go  through  a  similar  exercise.  This  is  treated  elsewhere  (Richards  4e  Jepson,  1994). 
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4.4  Snmmazy 

Our  notion  then  k  that  each  imace  k  erahiatad  with  raepeet  to  the  enirent  aet  of  ob- 
aenred  regularitka.  Theae  regukritiea  auggeat  a  context  that  dictatea  the  form  of  the 
model  tepreaantati<m.  Given  thk  repraaentatioo  and  the  image,  a  aet  erf  categorical 
atructurea  can  be  deduced  eanly  aa  *vivid*  atatea  (i«.  the  atate  apace).  The  context 
ako  pointa  to  preferencea  for  certain  3D  regularitiea  in  the  repreaentation,  which  are 
uaed  to  place  an  ordering  on  the  feanbk  atatea  or  *interpretationa”.  Hopefully  there 
will  be  a  unique  global  maximum  in  thk  ordering  that  "explaina”  aU  the  obeerred 
regularitiea,  given  the  image  and  the  preferencea  (auch  aa  in  Figure  11).  If  not,  or  if 
further  regularitiea  are  obeerved  in  the  reaultant  3D  interpretation,  or  if  additional 
relevant  premiaea  are  retrieved  from  the  knowledge  baae,  the  context  may  be  re- 
viaed  and  the  proceaa  continued  with  the  aim  of  inauring  that  all  regularitiea,  both 
in  the  image  and  in  the  interpretation,  are  explained  by  the  preferencea  at  hand. 
Sometimea,  aa  in  the  lower  panel  of  Figure  4,  cloaure  k  not  poaaible,  and  aeveral 
maximal  interpretationa  continue  to  be  evaluated.  In  all  caaea,  the  explanation  of 
the  image  attempta  to  maxhniie  our  preferencea  for  certain  world  regularitiea  over 
other  atatea.  Thk  leada  to  the  following  propoaal  for  defining  a  ‘percept”: 


Propoaal  1:  given  a  context,  a  percept  k  an  interpretation  in 
the  atate  q>ace  that  k  locally  maximal  within  the  aaaociated 
preference  ordering. 
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Ebewhcn  (Jepaon  ii  Richards,  1993),  we  have  elaborated  the  consequences  of  this 
proposal,  and  its  implications  f<ur  the  machinery  that  underlies  the  perceptual  pro¬ 
cess  itself.  However,  of  special  interest  may  be  the  relation  between  the  above 
Boolean  proposal  for  percepts  and  <»ie  based  on  versions  of  utility  or  probability 
theory  (such  as  Dempster-Shafer  or  Pearl’s  Bayeuan  graphs).  A  partial  bridge  to 
these  alternative  ^>proachee  using  a  Bayesian  formalisation  is  in  the  press  (Richards 
It  Jepson,  1994). 


5.0  lirqjectoiy  Mapping  (TM) 

According  to  our  view,  special  features,  called  “key  features”,  are  critical  to  set¬ 
ting  up  perceptual  categories  and  to  driving  our  percepts.  Once  these  features  are 
chosen,  then  the  parameterisations  in  the  perceptual  space  -  the  feature  space  if 
you  will  -  are  forced.  Over  twenty  years  ago,  multidimensional  scaling  was  intro¬ 
duced  to  recover  the  parameterisations  lued  by  the  human  perceiver  in  a  variety 
of  domains.  Recently,  we  have  invented  a  new  scaling  procedure,  called  "TVigec- 
tory  Mapping”  that  specifically  follows  the  feature  paths  in  the  perceptual  space 
(Richards  it  Koenderink,  1993 A). 

Briefiy,  TM  is  based  on  the  estimation  of  smooth  tr^ectories  in  a  multidi- 
noensional  feature  space.  Given  a  set  of  samples,  pairs  are  used  as  fixed  points 
on  a  trigectory  that  is  indicated  by  choosing  the  most  ^>propriate  extrapolants 
(interpolant)  from  tiie  remaining  samples.  A  distance  measure  for  these  choices  is 
sstimated  using  the  current  two  fixed  points  as  the  unit.  Unlike  MDS,  estimates  are 
not  made  if  the  fixed  point  samples  are  deemed  incompatable.  Also,  samples  that  lie 
on  a  bound  in  the  space  -  either  interior  ot  exterior  -  are  also  noted.  Hence,  trajeo 
tories  are  not  estimated  across  mutually  exclusive  features,  such  as  the  "red-green” 
exclusion  in  color.  The  result  is  a  web  of  smooth  trigectories  between  samples  in 
the  feature  space  -  something  akin  to  a  subway  map  -  with  some  of  the  trqectories 
terminated  either  at  the  boundaries  of  the  space,  or  at  the  exclusion  boundaries. 
These  tr^ectories  of  feature  similarities  are  then  scaled  in  a  higher  dimensional 
space  that  produces  a  hyper  surface  on  which  the  trqectories  correspond  to  "least 
energy*  paths.  For  example,  three  point  samples  on  a  spherical  surface  with  no 
exclusions  could  produce  four  trajectories.  Three  are  great  circles  and  the  fourth  is 
the  intersection  of  the  sphere  with  the  plane  containing  the  three  points.  A  hyper¬ 
bolic  surface  will  create  different  paths  in  the  feature  space.  To  compare  MDS  and 
TM,  we  present  scaling  results  for  20  OSA  uniform  color  samples  (Jr/.  Opt.  Soe. 
Am.  64,  1691,  1974).  A  ridge  along  the  yellow-blue  axis  appears,  which  both  de¬ 
flects  and  attracts  the  trajectories,  which  do  not  cross  this  ridge.  The  neutral  gray 
point  also  appears  as  a  post  through  which  no  tr^ectory  passes.  Similar  behaviors 
are  noted  when  the  TM  method  is  used  to  scale  thirty  natural  textures  taken  from 
the  Brodatz  album  (Dover,  1966).  A  complete  report  on  this  method  and  these 
preliminary  results  will  ^>pear  shortly  (Richards  it  Koenderink,  1993B). 
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6.0  Dynamical  Systems  Analysis:  Is  Perception  Chaotic? 

The  multistability  oi  impoverished  visual  displays,  such  as  the  Necker  Cube  or 
the  crater  illusioii  presented  in  Figure  12,  is  well  known.  Less  appreciated  is  the 
fact  that  even  complex  scenes,  although  rich  in  visual  information,  also  may  have 
many  interpretations  -  indeed  perh^is  even  an  infinity  depending  upon  the  level  of 
detail  desired  to  “explain*  the  scene.  Different  patterns  of  eye  movements  are  often 
associated  with  these  alternate  interpretations,  as  shown  by  Yarbus  (1967).  Hence, 
despite  ova  impressions  to  the  contrary,  the  chosen  percept  typically  entails  the 
selection  of  one  out  of  many  possible  interpretations  of  the  input,  even  if  the  context 
remmns  unchanged  (Jepson  and  Richards  1993).  Here  we  present  evidence  that 
processes  searching  for  the  a4>propriate  percept  have  some  characteristics  associated 
with  low  dimensional,  chaotic  dynamical  systems.  Our  results  confirm  a  proposal 
by  Poston  and  Stewart  (1978)  and  conclusions  reached  by  Tk’eed  et  al.  (1988)  that 
multistable  percepts  can  be  modelled  as  non>linear  dynamical  systems. 


Method 

A  dynamical  system  can  be  characterised  by  the  geometry  of  the  space  of  its  output. 
These  outputs  are  typically  a  sequence  of  state  changes,  either  temporal  or  spatial. 
Hence  passible  measures  that  we  can  use  to  evaluate  a  perceptual  dynamical  system 
are  (1)  the  sequence  of  time  intervals  between  perceptual  states,  such  as  the  duration 
of  successive  flips  of  a  Necker  Cube,  or  (2)  the  sequence  of  spatial  vectors,  such  as 
when  the  eye  moves  from  one  (s,y)  position  to  another.  In  either  case  the  data  will 
be  a  sequence  of  values,  {vi}  for  t  typically  greater  than  200.  Tb  determine  whether 
an  observed  pattern  exhibited  chaotic  behavior  characteristic  of  a  dynamical  system, 
we  chose  a  correlation  technique  outlined  elsewhere  (Bergi  et  al.  1984;  Grassberger 
and  Procaccia  1983).  This  technique  has  been  especially  successful  in  using  time 
series  data  to  analyze  non-linear  physical  systems,  such  as  turbulant  flow  (Essex, 
Lookman  k  Nerenberg,  1987;  Malraison  et  al.  1983)  or  semi-mnductor  resonators 
(Van  Buskirk  k  Je&ies  1985;  Theiler  1990).  Briefly,  the  method  first  computes  the 
number  of  vectors  of  length  (xj  —  Xy)  falling  within  a  p>dimensional  hypersphere  of 
radius  r,  where  {x}  is  the  set  of  m  sample  points  (e.g.  time  intervals)  collected: 

Cp(r)  =  to  m  ^l’’-  |x<  -xyll 

Cp(r)  is  the  average  number  of  vectors  or  the  “correlation  coefficient”  for  the  chosen 
radius,  p  is  the  “embedding”  dimension,  m  is  the  total  number  of  values  in  the 
sequence  and  ff  is  the  Heaviside  function,  which  is  evaluated  to  one  if  the  distance 
between  the  pair  t,  j  b  less  than  r,  otherwise  zero.  In  other  words,  the  correlation 
function  counts  the  number  of  pairs  with  distance  |X{  -xy|  smaller  than  r,  where  r  is 
the  radius  of  a  p-dimensional  hypersphere.  The  method  thus  provides  an  assessment 
of  the  geometry  of  the  space  from  which  the  samples  are  taken.  For  example,  if  the 
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apalda  dowa.)  (Coortaaj  of  Wide  World  of  Pkotoa,  Nor.  1973.] 

aamplaa  ara  unifonnly  distributed  <m  a  plane,  then  increasing  r  will  increase  the 
correlation  count  as  the  square  of  the  radius.  For  each  embedding  dimension  p,  an 
exponent  ep{r)  s  log[C(r)]/log[r]  is  calculated  over  r.  The  maximum  value  of  this 
exponent  Cp  is  then  determined  and  plotted  against  p.  ff  a  random  time  series  is 
evaluated  by  this  method  this  exponent  rises  with  the  embedding  dimension  (i.e. 
Cp  =  p,  or  more  correctly,  Cp  =  p  for  stochastic  series).  If  a  deterministic  chaotic 
series  is  evaluated,  such  as  that  generated  by  a  Henon  attractor,  then  the  maximal 
values  of  Cp  asymptote  at  some  p^Mx  for  all  p  >  Pmmx-  For  a  typical  predator-prey 
dynamical  system,  such  as  May’s  logistic  function,  Pmmx  =  I  >2.  Figure  13  illustrates 
this  method.  For  each  p,  log  Cp(r)  was  plotted  versus  log  r  and  the  steepest  slope 
Cp  was  computed  using  6  points  for  r  separated  by  factors  of  1.2  (i.e.  2^).  Note 
that  as  p  increases,  so  does  e^.  These  slopes  were  then  plotted  versus  p,  as  shown 
in  Figure  14.  The  error  bars  show  the  range  of  slopes,  based  on  adjacent  points, 
within  the  six-point  average  plotted  as  a  filled  circle.  The  weakness  of  the  method 
is  also  illustrated  here:  for  large  embedding  dimensions,  p  >  5,  the  uncertainty  in 
tp  increases  substantially,  especially  when  the  number  of  data  pmnts  k  small  (i.e. 
N  =  200).  Hence  in  our  study  estimates  of  tp  are  restricted  to  <  8. 
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Picarw  IS  Fin$  Iii  th«  uuUtirab  of  tk»  tin*  MqvaBC*  of  atot*  cbugw 
for  the  cr»ter  IDosioB  of  Figure  1.  Abecine  ie  tbe  ndiu  r  of  the  hTpenpheie 
of  dimeBBloB  p.  Eech  polBt  ehowe  the  eehie  for  that  Cp(r).  The  elopee  ob  thk 
log-log  plot  yield  the  Teluee  for  tp. 


Experiments 

We  obtained  data  for  four  eete  of  perceptual  tasks:  binocular  rivalry;  fixation  pat¬ 
terns  during  search;  simple  multistable  percepts;  and  perceptual  segments  or  "eras” 
in  several  movies.  The  panels  in  Figure  15,  plus  Figure  12,  show  examples  of  the 
multistable  stimuli  used.  Table  1  gives  the  number  of  reversals  recorded,  and  the 
mean  reversal  time. 
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Binocular  Rivalry 

When  two  patterns  having  significantly  different  spatial  structure  are  presented, 
each  to  separate  eyes,  typically  one  sees  an  alternation  of  one  pattern  with  the 
other.  For  exampk,  if  one  eye  is  presented  with  vertical  red  and  black  bars,  whereas 
the  other  sees  horisontal  green  and  black  bars,  the  percept  is  dynamic  alternation 
between  each.  The  site  of  this  competition  or  "rivalry”  is  known  to  be  early  in 
the  visual  pathway,  namely  near  the  initial  cortical  area  of  processing  (Julesz  1971; 
Richards  1970).  We  measured  the  time  series  for  200  alternations  for  two  subjects 
and  computed  the  correlation  coefficients  and  exponents  Cp  for  embedding  dimen¬ 
sions  p  =  1  to  8.  As  shown  in  Figure  14,  a  plot  of  Cp  vs  p  is  roughly  a  line  of  unit 
slope,  specifically  0.85  for  this  subject.  This  result,  regardless  of  the  underlying 
cause,  is  in  agreement  with  Levelt  (1965)  and  is  typical  of  a  stochastic  process  such 
as  the  pattern  of  rainfall.  Hence  we  conclude  that  binocular  rivalry  is  not  a  de¬ 
terministic  chaotic  process  typical  of  a  predator-prey  competition  (Kadanov  1986; 
May  1976). 


Cognitive  Rivalry 

Unlike  "passive”  binocular  rivalry,  the  fiip-fiops  of  the  crater  picture  in  Figure  12 
refiect  processes  that  are  evaluating  the  image  as  a  depiction  of  a  world  state  or 
"scene”.  The  competition,  if  you  will,  is  between  alternate  models  that  "explain” 
the  image.  It  is  easy  to  imagine  a  process,  such  as  the  fiuctuations  in  predator-prey 
populations,  where  one  explanation  dominates  for  a  while,  and  then  another  takes 
over.  We  studied  several  simple  displays  such  as  the  illusion  of  Figure  12,  a  picture 
of  a  triangle  with  a  stick  lying  across  one  edge,  as  well  as  some  cognitive  rivalry 
tasks  invented  by  Manfred  Fahle  (Fahle  and  Palm  1990).  All  of  these  examples 
revealed  hints  of  chaos.  One  of  the  clearer  results  was  obtuned  from  the  time  series 
of  fiip-fiops  for  the  crater  illusion.  This  display  is  particularly  well-suited  for  study 
because  the  perceptual  biases  can  be  manipulated  by  rotating  the  pattern,  placing 
the  illuminant  above/below  or  left /right  -  the  latter  being  chosen  because  it  yields 
a  more  equal  competition  between  percepts. 

The  data  for  observer  AJ  are  presented  in  Figure  14,  upper  right.  (The  results 
for  WR  were  similar  -  see  Table  3.)  Note  that,  in  contrast  to  binocular  rivalry, 
we  do  not  see  Cp  increasing  linearly  with  p,  but  rather  rising  and  leveling  off  near 
ep  =  3  for  p  =  4,  then  rising  again.  This  "notch”  in  the  curve  is  significantly 
different  from  the  linear  extrapolation  of  the  points  for  p  >  5.  We  propose  that 
this  "notch”  results  from  a  process  exhibiting  deterministic  chaos  of  dimensional¬ 
ity  roughly  3.5,  corrupted  by  a  noisy  stochastic  process  that  "takes  over”  as  the 
embedding  dimension  increased. 

Note  that  data  for  the  reversals  of  a  Necker  Cube  do  not  appear  in  Table  3. 
As  was  found  for  binocular  rivalry,  only  the  hint  of  a  "notch”  was  observed  in  the 
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Figure  14  Plota  of  ep  veniu  the  embedding  dimenelon  p  obteined  from  dmta 
like  thet  Oloetrated  fat  Figue  3.  Upper  left:  binocular  riraliy;  npper  right:  the 
crater  illoeion  of  Figure  1;  loiver  left:  monkey  eaccadic  eye  morements;  lower 
right:  perceptual  'eras*  in  *Out  of  Africa*. 


plot  of  Cp  V8  p.  This  result  suggests  that  the  dominant  process  causing  the  reversals 
of  the  Necker  Cube  may  be  formally  similar  to  the  the  process  that  leads  to  the 
competition  observed  during  binocular  rivalry. 


Eye  Movements 

Again  using  the  *notch*  as  our  indicator,  evidence  for  chaos  was  also  observed  in  the 
spatial  pattern  of  200  eye  movements  obtained  from  a  monkey,  who  was  searching 
a  17*  X  17*  field  for  a  target  (see  Sonuner  1993,  for  detdls).  Note  that  in  this 
case  the  analysis  was  not  performed  on  the  temporal  intervals,  but  rather  on  the 
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STIMULUS 

N 

MEAN 

DURATION 

DIMENSION,  p^ 

Binocular  Rivalry 

258 

2.7 

- 

Crater  Dlusion  (qj) 

8.2 

8.4 

Crater  Illusion  (wr) 

8.0 

8.6 

Tkiangle  and  Stick 

8.2 

8.5 

Duck-fish 

400 

4.8 

4.2 

Hag-girl 

200 

2.2 

4.8 

Eye  Movements 

200 

‘  0.2 

8.5 

•Out  of  Africa* 

820 

25.2 

8.8 

•ET" 

850 

16.8 

8.7 

•Chariots  of  Fire” 

850 

19.8 

8.5 

•Star  Wars” 

400 

15.2 

8.6 

AVG. 

8.7 

TkUc  S  Nombar  of  nmplM  N  aad  Mthnatat  of  dlmoBikm  mndorliliif  dyoMnkol 
Bynttm,  pmaa-  Tho  meaa  dnrstioiu  between  eamplee  ere  la  eeconde.  Tie  tjt 
movement  ajielyeia  ww  performed  on  the  epetiel,  not  the  temporal  patterns  of 
fixations;  the  eye  movement  duration  reported  above  is  the  mean  intersaccadic 
IntervaL 


magnitude  of  the  eaccade.^  The  location  of  the  notch  in  the  curve  of  Cp  vs  p  again 
■uggeets  an  underlying  dynamical  system  of  dimension  8.5  which  becomes  masked 
by  a  stochastic  process  at  the  higher  embedding  dimensions.  We  have  attempted  to 
model  these  data  using  a  combination  of  deterministic  and  stochastic  processes,  but 
to  date  have  not  succeeded.  [Although  the  unit  slope  observed  for  p  >  5  is  suggestive 
of  additive  "random”  noise,  as  yet  we  do  not  have  a  model  that  adequately  explains 
all  our  C{r)  vs  (r)  observations.  The  data  are  not  typical  of  the  expected  effects  of 
additive  of  noise  observed  in  physical  systems  (B’en-Mizrachi  et  al.  1984;  Theiler 
1990).] 


’The  data  we  report  were  taken  from  the  vertical  component  of  the  saccade. 
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Flgiir*  If  Some  of  the  etimnh  ued  that  hare  multiple  etable  etatee.  X^per  left: 
a  rhralroae  pattern  (daehed  liaea  Indicate  the  aet  pieaented  to  one  agre,  the  aoUd 
linea  indicate  the  aet  preaented  to  the  other).  Upper  right:  triangle  pint  atick 
(atkk  ia  commonly  aeen  aa  aitting  on  the  edge  of  the  triangle,  or  altematiyely  aa 
angled  in  apace  above  the  triangle  with  Ha  ri|^t  end  touching  the  interior  plane); 
lower  ri^t:  a  aingle  duck  va  two  fiah  re^wraal;  lower  left:  Boring*a  ambignone- 
mother-lu'law:  aa  old  woman  va  a  young  girL 


Movies 

The  most  complex  perceptual  activity  we  explored  was  the  time  sequence  of  seg¬ 
ments  that  appear  in  films.  Typically  the  camera  wiU  be  directed  to  a  person  or 
event  for  a  duration,  then  the  scene  fiips  to  another  person  or  event,  as  the  story 
unfolds.  Think  then  of  the  camera  as  the  eye  of  the  film  editor,  who  unfolds  the 
story,  attempting  to  commimicate  the  salient  points.  Certainly  this  process  entails 
a  very  high  level  of  perceptual  understanding.  However,  the  lengths  of  the  camera 
segments  are  rather  easy  to  measure  without  excessive  subjective  bias.  For  *Out 
of  Africa*,  ‘Chariots  of  Fire*,  ‘Star  Wars*  and  ‘ET”,  200  to  400  segments  were 
recorded.  These  films  were  chosen  to  represent  a  spectrum  of  fast-to-slow  paced 
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films.  The  *Out  of  Africa”  analysis  is  presented  in  the  fourth  panel  of  Figure  14. 
As  summarised  in  Table  3,  it  can  be  seen  that  the  other  three  films  were  similar, 
again  suggesting  an  underlying  dynamical  qrstem  of  dimension  about  3  to  4,  cor¬ 
rupted  by  stochastic  noise  as  the  embedding  dimension  exceeded  S.  [Of  interest  is 
that  ”Out  of  Africa”  also  yielded  an  embedding  dimension  of  less  than  three  for  the 
upper  component  of  the  C(r)  vs  (r)  plot.]^ 

Discussion 

If  one  views  the  brain  as  a  society  of  competing  neurons  or  neural  processes,  then 
their  characterisation  as  a  dynamical  system  is  not  surprising.  Indeed,  Poston  and 
Stewart  (1978)  proposed  such  a  model  for  multutable  percepts  and  such  behavior 
has  previously  been  reported  for  adaptive  ^sterns  and  neural  activity  with  dimen¬ 
sionality  estimates  ranging  from  3  to  7  (Altman  1991;  Priesol  et  al.  1991;  Skarda 
and  Freeman  1987;  Xu  and  Li  1986).  It  is  reassuring  that  the  observed  dimension¬ 
ality  of  their  conscious  counterpart  -  our  percepts  -  falls  in  this  range.  What  is 
surprising  is  that  our  estimate  lies  at  the  lower  end  with  little  variance.  A  dimension 
of  less  than  four  for  any  dynamical  system  offers  a  reasonable  possibility  for  model 
development  and  study. 

A  second  surprise  is  that  regardless  of  the  complexity  of  the  perceptual  task, 
when  chaotic  behavior  is  plausible,  the  inferred  dimensionality  remains  roughly  the 
same.  It  is  difiicult  to  believe  that  the  processes  underlying  the  reversab  in  the 
crater  and  the  temporal  sequence  of  segments  in  the  movies  share  the  same  neural 
machinery  to  any  great  degree.  Yet  they  both  exhibit  the  same  dimensionality!  One 
type  of  process  that  would  elicit  such  a  result  is  one  that  is  organized  in  a  hierarchical 
manner  having  a  fractal  composition  with  roughly  the  same  number  of  components 
collected  together  in  each  level  of  the  hierarchy.  In  other  words,  in  such  a  scheme, 
neural  subsystems  would  compete  for  the  attention  of  their  ‘parental”  systems,  etc. 
The  fractal  dimension  common  to  all  tasks  would  then  indicate  that  each  parent 
has  roughly  the  same  number  of  ‘ofipring”.  Here,  we’ve  simply  adopted  a  proposal 
made  elsewhere  for  social  choice,  where  each  state  decision  is  not  a  single  rational 
choice,  but  rather  an  aggregation  of  many  hierarchical  choice  problems  (D.  Richards 
1991).  In  this  domain  the  fractal  dimension  also  is  surprisingly  low. 

Thirdly,  with  respect  to  the  eye  movement  data  and  results,  which  are  spatial 
and  not  temporal  patterns,  of  interest  is  that  again  the  dimensionality  appears  to  be 
about  3.5.  This  suggests  that  perceptual  mechanisms  may  be  directing  the  spatial 
pattern  of  eye  movements  during  the  search  for  a  target’s  appearance.  Note  that  this 
‘chaotic”  search  occurs  even  though  the  intersaccadic  duration  intervals  are  short, 
around  200  ms  (Scanmer  1993).  The  scanning  strategy,  then,  appears  controlled  by 
a  non-random,  dynamical  spatial  representation,  but  not  a  temporal  one.  Again,  a 


^The  other  film  data  also  suggested  a  lower  dimensional  component  for  values  of  C(r)  > 
.03,  but  this  observation  remains  tentative. 


39 


BICHARDS 


FINAL  REPORT  1990-93 


hierarchical  March  pattern,  where  email  and  large  etep  eisee  are  nested  as  in  a  tree 
or  branching  structure,  would  be  one  strategy  that  could  exhibit  such  deterministic 
chaotic  behavior. 

Finally,  although  our  percepts  ^pear  to  exhibit  some  properties  consistent 
with  deterministic  chaos,  this  shoiild  not  be  construed  to  imply  that  the  percepts 
themMlvM  are  chaoticl  Indeed,  the  percepts  can  be  quite  rational  and  stable. 
However  the  state  changes  underlying  the  development  of  these  percepts  seem  char¬ 
acteristic  of  a  special  kind  of  dynamical  system  that  is  well  suited  for  searching 
rapidly  through  a  large  data  baM  of  variable  resolution,  where  many  states  are  to 
be  evaluated  quickly,  probably  in  parallel. 


7.0  Spinoffs 


7.1  Texture  Curvature  (with  Hugh  Wilson) 

This  study  examined  curvature  ducrimination  for  edges  created  by  texture  con¬ 
tours,  and  includes  a  model  incorporating  end-stopped  complex  cells.  It  appears 
as  “Curvature  and  Mparation  discrimination  at  texture  boundaries*,  JrU  Opt.  Soe. 
Am.  A,  9:  1653-1661  (1992). 

7.2  Shading  and  Stereo  (Dawson  Sc  Shashua) 

PMudo  stereopsis  is  when  the  binocular  disparities  of  a  surface,  such  as  a  face,  are 
reversed  but  the  shading  is  not.  The  impression  is  that  the  face  is  “normal”  -  the 
nose,  for  example,  still  points  outward  to  the  viewer. 

We  have  manipulated  noses  using  gr^>hics  techniques  in  order  to  push  them 
inward,  “into  the  head*  so  to  speak,  without  altering  the  shading.  No  one  is  able  to 
see  these  noses  “shoved  in”.  Our  analysis  suggests  that  this  fiulure  of  stereopsis  is 
simply  due  to  the  shape-from-shading  solution  “overriding*  (in  the  Percepts  Lattice 
sense)  the  weak  stereo  signal  created  by  shaded  rather  than  sharp  contours.  The 
effect  is  not  special  to  faces,  and  occurs  also  for  “playdo*  shapes. 


7.3  Configuration  Stereopsis  (Richards) 

We  are  just  winding  up  a  study  on  3D  sh^e  that  relates  to  how  “top-down”  in¬ 
formation  about  fixation  distance  (or  shape)  modulates  angular  disparity.  Because 
binocular  disparity  appears  to  be  computed  in  V2,  this  modulation  must  occur  early 
in  the  visual  pathway  and  hence  is  potentially  accessible  to  psychophysical  probing. 

As  the  distance  to  an  object  increases,  the  angular  disparity  needed  to  measure 
the  actual  3D  configuration  must  decrease  (reding  sero  at  the  horizon).  However, 
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if  we  take  an  object,  say  a  cup,  and  evaluate  its  SD  sh^pe  nearby  versus  fv  mtny, 
the  cup  does  not  appear  to  flatten,  although  the  disparity  signal  beccunes  much 
smaller  as  the  distance  increases.  This  suggests  a  rescaling  of  disparity  with  object 
(or  flxation)  distance. 

We  have  conducted  parametric  studies  of  SD  sh^>e  frcun  stereo  over  a  wide 
range  of  flxation  distances.-  The  data  show  that  indeed,  the  depth  measure  asso¬ 
ciated  with  a  fixed  angular  disparity  changes  with  fixation  distance.  Tbe  effect 
is  in  the  directitm  needed  to  preserve  the  shape  of  SD  coafigurationa  as  their  dis¬ 
tance  changes,  and  is  roughly  two-thirds  of  what  is  needed  for  a  full  correction. 
This  is  evidence  for  neural  signals  being  modified  at  or  before  the  extraction  of 
binocular  disparity.  Hence  we  have  a  preliminary  ^handle*  on  how  a  ainq>le  case  of 
*bottom-up*  information  -  namely  binocular  disparity  -  may  incorporate  a  form  of 
"top-down”  knowledge. 


7.4  A  Neural  Proposal  (UHman) 

Ullman  (1992)  has  proposed  a  network  heirarchy  scheme  for  how  "bottom-up*  in¬ 
formation  comes  into  register  with  "top-doam*  models.  The  basic  process,  termed 
"sequence-seeking”,  is  a  search  for  a  sequence  of  mappings  or  transformations  link¬ 
ing  a  source  and  target  representation.  The  search  is  bidirectional  throughout  the 
heirarchy  -  "bottom-up*  as  well  as  "top-down”.  The  novel  part  of  the  proposal  is 
that  the  two  searches  are  performed  along  two  separate,  complementary  pathways, 
one  ascending,  the  other  descending.  When  a  matching  pattern  is  found,  regardless 
of  the  level,  then  a  chain  of  activity  linking  the  source  and  target  is  generated,  facil¬ 
itating  one  particular  path  in  the  netwcuk.  The  proposal  is  largely  consutent  with 
what  is  known  about  cortical  machinery,  specifically  the  interplay  between  the  var¬ 
ious  visual  areas,  and  hence  is  a  hypothesis  about  the  basic  scheme  of  information 
processing  in  the  neocortex  (and  thalamus).  Experiments  related  to  this  proposal 
are  currently  underway. 
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